Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose delimiter character in JSON reader options to JSON reader APIs #17266

Open
wants to merge 8 commits into
base: branch-24.12
Choose a base branch
from

Conversation

shrshi
Copy link
Contributor

@shrshi shrshi commented Nov 7, 2024

Description

Addresses #17261
Removes delimiter symbol group from whitespace normalization FST since it is run post-tokenization.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Nov 7, 2024
@shrshi shrshi added bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change labels Nov 7, 2024
@shrshi shrshi marked this pull request as ready for review November 7, 2024 14:17
@shrshi shrshi requested a review from a team as a code owner November 7, 2024 14:17
@shrshi shrshi requested review from vyasr and ttnghia November 7, 2024 14:17
@shrshi shrshi requested review from a team, davidwendt, karthikeyann and ttnghia and removed request for a team November 8, 2024 02:18
Comment on lines +75 to +76
template <typename SymbolT>
auto get_sgid_lut(SymbolT delim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this be a template function?

@ttnghia ttnghia removed their assignment Nov 8, 2024
@ttnghia ttnghia removed their assignment Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

[BUG] cudf::io::json::detail::normalize_single_quotes outputs incorrect result when the input has \n character
4 participants